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ABSTRACT 

The purposes of this study were to examine the 
reliability and validity of a basal reading series mastery test, and 
to explore the appropriateness and usefulness of two stratiegies for 
investigating the reliability and validity of criterion-referenced 
tests. Subjects were 47 sixth graders, who were tested on the SRA 
Reading Achievement Test, the Houghton-Mifflin End-of-level 11 Basic 
Reading Test (BRT) , and the. Word Reading Test. A subgroup of 20 
children was tested a second time on the BRT. Traditional 
psychometric correlational analyses as well as specific strategies 
for examining the adequacy of criterion-referenced tests were applied 
to the data to investigate the following dimensions of the technical 
adequacy of the BRT: (1) consistency o? student performance across^ 
two administrations of the BRT, and (2) criterion validity of the BRT 
scores with respect to two other measures of reading proficiency and 
criterion validity of the BRT mastery/nonmastery decisions with 
respect to pre^post instructional status. Results indicated that the 
reliability and validity of the BRT was less than adequate, and that 
both strategies for investigating the adequacy of a 
criterion-referenced test were useful and provided complementary 
information. Implications for the development and use of 
criterion-referenced instriiments are discussed. (Author) 
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Abstract 

The purpo$es of this study were to (a) examine the reliability 
and validity of a basal reading series mastery tes.t, and (b) explore 
the appropriateness and usefulness of two strategies for investigating 
the reliability and validity of criterion-referenced ' tests. Subjects 
were 47 sixth graders, who were tested on the SRA Reading Achievement 
Test, the Houghton-Mifflin End-of-level 11' Basic Reiading Test (BRT), 
and the Word Reading Test. A subgroup of 20 children was tested a 
second time on the BRT. Traditional psychometric correlational 
analyses as well as specific strategies for examining the adequacy, of 
criterion-referenced tests were applied to the data to investigate the 
following dimensions of the technical adequacy of the BRT: (a) 
consistency of student performance across two administrations of the 
BRT, and (b)' criterion validity of the BRT scores with respect to two 
other measures of reading proficiency and criterion validity of the 
BRT mastery/nonmastery decisions with . respect to pre/post 
instructional status. Results indicated that the reliability and 
validity of the BRT was less than adequate, and that both strategies 
for investigating the adequacy of a criterion-referenced test were 
useful and provided complimentary information. Implications for the 
development and use of criterion-referenced instruments are discussed. 



t . 

The Technical Adequacy of a Basal Reading 
Series Mastery Test 

With the growing demand for accountability in the schools, the 
focus on educational tests has expanded. Norm-referenced achievement 
testing, the traditional measurement format, is the predominant 
measurement strategy for evaluating and documenting program effects. 
Concurrent with its frequent use, however, is growing recognition that 
norm-referenced measurement may be inadequate for its intended 
purposes: It has poor content validity with respect to classroom 
curricula, and it fails to indicate the extent to which individuals or 
groups have mastered specific educational objectives (Skager, 1971). 

As an alternotive to traditional edu'-ational measurement, 
criterion-referenced (CR) test^ing has received greater attention in 
the past two detades by measurement theorists, test developers, and 
school personnel. As conceptualized by Glaser and Nitko (1971), the 
CP test is a sample' of items yielding information that is 
interpretable directly with respect both to a well-defined domain of 
tasks and to specified performance standards. This definition 
reflects three characteristics that frequently are employed in the 
literature to describe CR measurement: (a) definition of a well- 
specified content domain (Baker, 1974; Hambleton & Novick, 1973; 
Millman, 1974), -(b) delineation of valid performance criteria 
(Hambleton, 1980), and (c) development of procedures for generating 
appropriate samples of tests (Goodstein, 1982; Hambleton, Swaminathan, 
Algina,.& Coulson, 1978; Popham, 1980). All three components stress 
the edumetric and psychometric properties of CR tests. 

Nevertheless, the focus both "in publishing" houses and in the 



ERIC 



6 



2 

schools has been more utilitarian. With the recognition that CR tests 
provide relevant data for describing student progress with respect to 
specif>ic learning objectives, their use has prolifcated. Test 
developers have marketed CR instruments along with objective banks; 
commercial curriculum writers have published CR tools for assessing 
mastery within t'leir series; school districts have created their own 
CR tests; and teachers have developed such instruments to fit 
individual learning objectives. Unfortunately, there-has been a lack 
of concommitant investigation of the reliability and validity of these 
tests. 

Therefore, although two measurement formats currently, are 
available and used in education'aJ settings, neither is adequate for 
evaluating the effects of instructional programs. While norm- 
referenced tests frequently demonstrate several strong psychometric 
characteristics, they lack content validity and utility. Alternately, 
CR instruments are isomorphic with respect to classroom curricula and, 
as such, appear very useful; however, there "is little evidence that 
such measurement is accurate or meaningful. 

The current study addressed part of this dilemma by beginning the 
task of investigati/ig the reliability and validity of available CR 
tests. Traditional ways of assessing such adequacy, however, have 
been criticized as largely inappropriate for CR instruments (Popham & 
Husek, 1969). Hambletori and Novick (1973) reasoned that, because one 
of the purposes of a CR test is to identify mastery within a domain, 
test variance typically is small.. Homogeneous distributions of test 
scores are centered at the low aad high ends of the measurement scale. 



re^^pectively representing pre and post-instruction performance 
(Hambleton & Novick, 1973). When the variance of test scores is 
restricted in this way, correlational estimates of reliability and 
validity tend to be low. In response to this problem, alternative 
analyses for investigating the adequacy of CR tests have been 
developed (Berk, 1980); in contrast to the correlation statistic, 
these analyses rely minimally on the notion that inter-individual 
variability is necessary (Carver, 1970; Hambleton & Novick, 1973; 
Huynh, 1976; Subkoviak, 1975). 

Despite the development of such analyses, it appears that 
developers of commercial CR instruments, if they address technical 
adequacy at all, still rely predominantly on traditional psychometric 
correlational analyses. Inspection of eight commercial criterion- 
referenced instruments and four basal mastery lests revealed that (a) 
only one-third of the test manuals addressed reliability and validity 
at all , and (b) only traditional analyses were employed in the 
investigations of the instruments* technical adequacy (see Table 1). 



Insert Table 1 about here 



In the present study, both traditional correlational statistics 
and alternative CR approaches were employed to examine the adequacy of 
one CR instrument developed and published by a reading series company. 
The purpose of this study was twofold. First, the investigation was 
designed to contrast results based on the traditional and alternative 
approaches to studying the technical adequacy of CR instruments. Such 
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a contrast should shed light on the appropriateness and potential 
usefulness of each strategy. The second pijrpose was to describe the 
reliability and validity of the specific CR measure examined. Despite 
widespread use of this test, there are few, if any, reports concerning 
its adequacy.! The .. investigation of the test's reliability and 
validity should provide information of interest not only to consumers 
of this measure but also to users of other CR tests for which 
technical data also are still unavailable. 

Method 

Subjects 

Subjects were 47 students (20 M, 27 F) from two sixth , grade 
classes. Each class represented a school district within a rural 
midwestern educational cooperative. The students' mean reading 
percentile rank was 51.48 (SD = 18.11) as measured on the Science 
Research Associates (SRA) Reading Achievement Test. 
Measures 

Three measures of reading performance were used in the study: a 
basal series criterion-referenced test, a glbbal norm-referenced test, 
and a curriculum-based word reading test... 

Criterion-referenced test . Three scales of the End-of-level 11 

Basic Reading Test (BRT; Brzeinski & Schoephoerster, 1974) of the 

Houghton-Mifflin basal reading series were employed as measures. Each 

of the three scales. Decoding Skills, Comprehension Skills, and 

Reference/Study Skills is -comprised of several subtests. Table 2 

/I ■ . « 

lists the subtests constituting each scale and provides brief 

descriptions of tasks the examinee "is required to db within each 
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subtest. This BRT Is designed as a criterion-referenced test, with 
items per subtest ranging from 6 to 12 and with mastery-nonmastery 
cutoff scores established at to 85?^ correct responses. 



Insert Table 2 about here 

" / 

/ 

Norm-referenced test . The Science Research Associates (SRA) 
Reading Achievement Test (Naslund, Thorpe, & Lefever, 1978) is 
comprised of two subtests: vocabulary and comprehension. In the 
vocabulary section, examinees are required to select, from four 
alternatives, a synonym for an underlined word in a sentence. In the^ 
comprehension section, examinees read 200-300 word passages and answer 
questions in a multiple choice format .y^Total test score is based on a 
linear combination of the two subtests. Internal consistency 
reliability was reported at '.88 "(Salvia & Ysseldyke, 1981). 

Curriculum-based word reading test . The Word Reading Test (Deno, 
Mirkin, & Chiang, 1982) requires children to read aloud passages and 
isolated word lists and is scored in terms of average numbers of words 
correct and incorrect over two alternate forms of the Isolated Word 
Reading and Passage Reading scales. The 200-word passages are drawn 
randomly fronj a student's grade-appropriate level basal reading book; 
the 150-word lists sample words randomly from -basal s,. with 60?J of 
words drawn from the student's grade-appropriate level and 40?^ sampled 
equally from all previous levels. For the passage and isolated Word 
Reading Test, test-retest and alternate form reliabilities were at 
least .90 (Fuchs, Deno, & Marston, in press;. Fuchs, Wesson, Tindal, 
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Mirkin, & Deno, 1981). 
Procedure 

All students were tested in groups, by a school psychologist tor 
the SRA Reading Achievement Test, and by their classroom teachers for 
the BRT. The Word Reading Test was administered individually by 
trained aides. Standardized administration procedures /ore followed 
on all tests. Testing time ranged from 60 to 90 minutev* for the SRA 
test, 60 to 90 minutes for the BRT, and five to six minutes for the 
Word Reading Test. All testing was completed within a two-week 
period. 

' To assess test-retest reliability quest ions-, a subgroup of 20 
students 9 F) was administered the measures in the following 

order: BRT, SRA Reading Achievement Test, Word Reading Test, and BRT 
again. For the remaining, 27 students, each measure was given one 
time, with the order of adiTiinistrat ion random. 
Data Analysis 

Cons i st ency of performance ort two admi n i strat i ons of the same 
test . Consistency of students' performance on the BRT was assessed in 
three ways. In all three analyses, the students who had been tested 
twice on* the BRT (N=20) were the subjects. First, traditional test- 
retest reliability was determined by correlating scores from the two 
administrations of the BRT. The other two analysis strategies were 
designed specifically for criterion-referenced measures (see Millman, 
1974). In the first of these, consistency of students' subtest ' scores 
was determined by (a) computing individuals' percentage correct score 
on each subtest for each administration of the BRT, (b) calculating 
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for each Individual his/her difference score across the two 
administrations of each subtest, and (c) determining the percentages 
of examinees having each possible difference Score on each subtest. 
In the second strategy, consistency of mastery-nonmastery decisions' on 
subtests was determined by dividing the dif feren'i:'6 between 'observed 
and chance proportions of agreements In decisions by the maximum vail ue 
that difference could assume. (The chance proportion of agreements 
was computed by multiplying and then summing the marginal proportions 
of the same decision categories for the two administrations, as done 
in a chi-square test of association.),. 

Criterion validity . The criterion validity of the BRT was 
determined in two ways, employing the entire group of subjects (N=47). 
The traditional psychometric strategy of correlating scores on the 
measure of interest (BRT) with criterion measures was used. The SRA 
Reading Achievement Test and the Word Reading Test were employed as 
the criterion measures. AdditionaHy, chi-square statistical tests 
were applied to contingency tables wherein mastery-nonmastery 
represented one dimension of each table and pre-post instructional 
status represented the other dimension. Percentages of 
misclassifications supplemented the chi-square tests. 

Results 

/Table 3 is a display of students* mean scores and standard 
deviations on each subtest of the BRT, on each subscale and the total 
• of the SRA Reading Achievement Test,^ and on the isolated word reading 
and passage reading scales of the Word Reading Test. 



Insert Table 3 about here . 

Consistency of Performance on Administrations of the Same lest 

Test-retest rfeliability correlations on subtests of the BRT are 
;display.ed in labile 4. For the decoding subtests, correlations' were 
low, ranging ^ from .20 to ,42; for the comprehension subtests, 
correlations were low to moderate, ranging from .03 to .83; and for 
the study/reference skills subtests, correlations. were high, ranging 
between .86 .and .94. - 

Ijisertt-Table 4 about hisre . ; • 

The second analysis of the consistency of performance involved 
caictiiating the percentages of examinees who had different percentage 
correct scores across the two administrations of the BRT. Figures 1-4 
are graphic displays of the percentages ^^f examinees displaying 
various difference scores on each subtest of the BRT; Table 5 
•summarizes the information illustrated on the graphs. The range of 
difference scores on the subtests fell between 0 and 83%. The 
^-percentage" of examinees with ^ 0% difference scores on two 
administrations ranged from 22 on an information appraising subtest to 
85 on the word attack subtest. Across the decoding subtests, the mean 
percentage of examinee^ with 0?^ differences scores was 65 (SD = 
28.28); across the comprehension subtests, the mean percentage was 
57;20 (SD =14.96); across the study/reference skills subtests, the 
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mean percentage was 51,25 (SD = 18.76); and. across all the subtests, 
the mean percentage was 55.07 (SD = 17.92). 



Insert Figures l-A'^ and Table 5 about here 



The thine) analysis of the consistency of performance addressed 
consistency of mastery-nonmastery decisions across the two 
administrations of the BRT. Table 6. is a display of the uncorrected 
and corrected proportions of examinees placed into the same decision 
category on the two administrat ioas. On the decoding subtiests. the 
corrected proportions are low, with the proportion, of agreement on the 
Word Attack subtest 6% lower than chance and the proportion of 
agreement on the Pronunciation subtest only 18% greater than chance. 
On the comprehension subtests, the proportions of agreement were quite 
variable, ranging from IS% lower than chance to 88% greater than 
chance. On the study/reference skills subtests, proportions of 
agreement were moderate to high, ranging from 51% to 78% greater than 
chance. 



Insert Table 6 about here 



Criterion Val idity ^ 

Correlational analyses were conducted between the BRT subtests 
and two criterion measures, the SRA Reading Achievement Test and the 
Word Reading Test. Correlations between the BRT subtest and the SRA 
subscale and total test scores^ are displayed in Table 7. They ranged- 



from .35 to .73 when SRA vocabulary subscale scores were involved, 
from .19 to .70 when SRA comprehension subscale scores were employed, 
and from .26 to .75 when SRA total scores were used. The average 
correlation for ^BRT decoding subtests was .41 (SD = .02); for BRT 
comprehehsion subtests, the average correlation was .52 (SD = .21), 
and for BRT study/reference skills subtests, it was .57 (SD = .07). 



Insert Table 7 about here 



Correlations between the BRT subtests and the Word Reading Test 
subscale scores are displayed in Table 8. They ranged from .27 to .57 
when isolated word reading scores were involved,.^ and from .31 to .68 
when passage reading scores were employed. The mean correlation for 
the BRT decoding subtests was .34 (SD = .08); for the BRT 
comprehension subtests, the mean correlation was .47 (SD = .13), and 
for the BRT study/reference ski lis' subtests, it was .56 (SD= .06). 



Insert Table 8 about here 



Criterion validity also was examined by inspecting the relation 
between mastery-nonmastery decisions on the BRT and actual pre-post 
instructional status. Relevant chi-square values, £-values, and 
percentages of misclassified students are displayed in Table 9. 
Across the decoding subtests of the BRT, the average percentage of 
misclassified students was 40.50 (SD = 3.54); across the comprehension 
subtests, the average percentage was 39.00 (SD = 4.58), across the 
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study/reference skills subtests, it was 23.33 (SD = 8.51), and across 
all the subtests, it was 33.50 (SD = 9.99). 



Insert Table 9 about here 



> — —Discussion — : " 

The purpose of the current study was twofold, 'first, the study 
was designed to describe the reliability and validity of a criterion- 
referenced mastery test of a basal reading series. Second; by 
examining this reliability" and validity, both- with traditional 
corr-^l^tional analyses and with alternative strategies developed 
speci.iCally for criterion-referenced instruments, this investigation 
sought to contrast results and assess tjie appropriateness and 
potential usefulness of each strategy. 

• With respect to its firsi? purpose, the study, ex'amined two aspects 
of. the technical adequacy of the Houghton-Mifflin End-of-level 11 
Basic Reading Test: the consistency of students performance on two 
administrations of the test, and the criterion validity of the test. 
On both of these indices, the Houghton-Mifflin BRT appeared 
inadequate. . . 

Test-retest reliability coefficients indicated that, when the BRT 
was administered twice within a short time interval, . students* 
performance was very inconsistent on the decoding subtests; none of 
the correlations obtained for the decoding subtests even fell within 
"the acceptable range for making group decisions (Salvia & Ysseldyke, 
1981). On the comprehension subtests, ^correlations' were poor to fair. 



with the correlation for only one subtest, Meaning Acquisition, 
falling into the acceptable range for group decision making and with 
none of the correlations high enough for making decisions about 
individual students. On the study/reference ski lis. subtests, however, 
student perforrnance.-was_more- consi stent, —with-all-correlatib^^^^^^ or 

better. 

Resu^lts of this traditional correlational analysis of consistency 
of student performance across tests were corroborated with the 
criterion-referenced strategy of examining the proportions of 
examinees consistently classified into the same decision category. As 
with the correlational analyses, on the decoding subtests the 
proportions"liiere low, at an average of on 1 y^ 6?i better than chance 
agreement. On the comprehension subtests, proportions were low to 
moderate, with 57% greater than chance agreement on Literal 
Comprehension, 15?i ^less than chance agreement on Interpretative 
Thinking, and a mean 62.33?i greater than chance agreement on Meaning 
Acquisition. On the study/reference skills subtests, proportions were 
moderate to high with an average 66. 25?J greater than chance agreement. 

When inspecting the consistency of test scores displayed in 
Figures 1-4, and in Table 5,. the percentages of examinees scoring the 
same across two administrations of the BRT appear variable. There was 
no identifiable pattern -within- BRT scales; the average percentage of 
subjects scoring the same across all the subtests was 55. Given the 
fact that there are only 6 to 12 items per subtest and given a mastery 
criterion of 83% to 85% per subtest, a difference of one or two items 
correct in an administration of the BRT subtest can result in 



different mastery deci s . ons . Thus , an average of 55% of subjects 
scoring the same on two BRT administrations appears to be lower than 
desirable. . - ; 

Tlie_jresults^^ the three analyses indicate that the consistency 
of student performance on the BRT is less than adequate and that 
educators should exercise caution as they attempt, on the basis of one 
administration of the BRT, to formulate decisions concerning whether 
individual students should progress to more, difficult instructional ' 
material. While the study/reference, skills subtests may be adequate 
as a data base for making • such decisions, the decoding and 
comprehension subtests, which teachers may consider more critical„for 
formulating decisions about reading proficiency, were unreliable. 

The . criterion validity of the BRT also was examined. The 
traditional correlational analyses indicated that the criterion 
validity of the BRT with respect to the SRA Reading Achievement Test 
and the Word Reading Test was poor to fair, with correlations falling 
between .19 and .73^ Correlations on the Interpretive Thinking, 
comprehension subtest were the lowest. Statistics for the decoding 
subtests also ^.were relatively low, whereas the figures for the 
remaining, comprehension and study/reference skills subtests were 
somewhat higher. Correlations among measures of reading proficiency 
frequently have been reported at high Jevels (Fuchs, Deno, & Marston, 
in press; Fuchs, Fuchs, & Deno, . 1982) . This indicates that the 
figures for the BRT are comparatively low and that performance on the 
BRT is a relatively poor predictor of- concurrent performance on other 
measures of reading proficiency. 

• 18 . ■ . 



The criterion validity ofnhe BRT also was investigated with the 
criterion-referenced strategy of examining the relation between the 
mastery-nonmastery classification on the BRT and actual pre-post 
instVuctional status. Relatively high percentages^ of 

misclassifications (1556 to 43%) were found, suggesting limited utXity. 
of the BRT for classifying students ^nto groups for instruction within 
the basal reader for, which the BRT was designed T ^^--^^ .^^ 

Consequently, the current "study casts doubt on the re'lTabi-^ljty 
an^ validity, of the Houghton-Mifflin End-of-level 11 Basic Reading 
test, and suggests that educators use this test with caution. 
Educational tests are designed to sample an individual ' s behavior, as 
a basis for. drawing generalizations concerning his/her functioning and 
for making instructional decisions. When tests. sample behavior in 
meaningful (va^id) and, accurate (reliable) ways, they arfe useful for 
such purposes. Although criterion-referenced tests may possess high 
content and face validity, their .meaningfulness and accuracy remain 
empirical questions:, an issue frequently ignored by criterion- 
, referenced test developers. By investigating the reliability and 
validity of one cri.teri on-referenced test, the present study (a) 
documents the notion that content validity is a necessary, but 
insufficient aspect of criterion-referenced test, adequacy, and (b) 
underscores the. importance of investigating the reliability and 
validity of criterion-referenced tests as they are developed. 

The second purpose of this study was to compare the 
appropriateness and usefulness of traditional analyses with strategies 
developed specifipally for criterion-referenced tests. Findings 
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discussed above suggest that the two types of analyses tend to 
corroborate and enhance each - other, providing complimentary 
infor'mation. It appears that both stratejies may be* appropriate an(i 
necessary for investigating and describing the reliability and 
validity of cri i.er ion-referenced tests. 
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Footnote 

^In response to a written request for information concerning the 
technical adequacy of the test studied here, publishers described the 
field-testing that they had conducted. This response (a) alluded to, 
but failed to describe, an item analysis of test data; and (b) 
reported on a pre-posttest study in which students demonstrated an 
average growth of 8.5 grade equivalent months in 7 chronological 
months on the Gates-MacGinitie. Authors of the response stated that 
"This tends to confirm that the use of tcriteri on-referenced] 
tests. ..to oionitor effectiveness of instruction and reteaching 
contributed to an appropriate rate of progress among students." 
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Table 1 

Traditional and Alternative Studies of Reliability and Validity Reported In Manuals of Coianercial 
Criterion-referenced Tests and Basal Series Mastery Tests 



' T 

Reported in Test Manuals 



.Traditional (correlational) analyses Alternative (criterion-referenced] analyses 

reliability . validity ^ • ' 

inter* alternate- internal test^ construct criterion reliability validity 

rater form consistency retest , studies studies 



Diagnostic Inventory of 
Skills (1977) 

Diagnostic Inventory of 
Development (1977) 

1974) 

m and .Monitoring System 

ental Programming for infants 
jng children:' Assessment & 
ation (1977) 

rten Evaluation of Learning 
^al: A curricular approach to 
tlon (1963) 

Accomplishment Profile (1977) xfr 
Staircase (1976) 
es Master y 

1 11979) 

■81) 

-HiffHn (1974) 
resman (1981) 



es that a^ study was reported in the test manual. ^ 
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ERIC 



Examinees' Tasks on the Houghton-Mifflin End-^o^f-Level 

11 Basic Reading Test 



Seal e/Subtr,st 



Examinees* Tasks 



Decoding 

Word Attack 



Pronunciation 

Comprehension 

Literal Comprehension 



Interpretive Thinking 



Meaning Acquisition 



Read a sentence from which letter(s) 
of one word have been deleted. From 
an array of three choices, circle, 
the word that most nearly sounds- Mke 
the unfinish'ed word. 

Given a word in dictionary sn'^llinq. 
select from three choices tl. word(s) 
with the same V6wel sounds the 
dictionary-spelled word. 



1. Read a factual article comprising four, 
paragraphs. Then, identify each of 12 - 
statements as either, true^or false with 
respect to information provided in the 
article. . - ' 

1. Read a paragraph,* and (a) select the 
main idea from a 'set of-statements, 
and (b) determine whether each distractor 
is not the main idea because the para- 
graph either fails to address the state- 
ment or is broader than the statement. 

1. Given a sentence with an underlined 
word and given meanings for the under- 
lined word, select the meaning that 
best' fits the sentence. 

2. Given a sentence with an underlined 
figure of speech, select from a set of 
possible statements the one best 
defining the figure of speech in the 
sentence. 

3. Given , a sentence with an underlined 
word containing a common prefix and 
given three possible.meanings, select- ■ 
the best- meaning for the underlined 
word. 



Table 2 (continue'd) 
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ScaTe/Subtest 



Examinee^' Tasks 



Reference/Study Skills 

Information Locating 



Information Appraising 



Information Organizing 



1. Given a book's abbreviated index and a 
set of questions, write page numbers 
of the book on which a relevant answer 
might be located for each question. 

2. Given questions and an illustration of 
'a 21-volume encyclopedia, write the 
volume number' in which relevant informa- 
tion might be located for each question. 
Then, given questions and ;a list of 

. possible subheadings for the topic! 
^ Newspaper , write the subheading in which • 
■ a relevant answer might be located for 

each question: 

' ' " . . * . 

3. Given questions and an illustration of 
a card catalog, identify the drawer in 

^ "which a relevant answer might be located 
V for each question.. Then, .given questions, 
determine whether one would search for 
an authofV title, or subject card for a 
relevant answer to each question. 

4. Given questions and a 5-columh, lO-row 
table containing information on the first 
10 presld^^^^ 

1. Identify whether statements are fact, - 
fiction, or both. 

2. Given a set of opinion stateme;nts and 

a set'of persons with biographical in- ; 
formation, match the person best qualf- . 
fied to make each opinion statement. , 

3. Identify whether or 9not a statement 
contains vague statements, and if so, 

'underline the vague statement. 

1. . Read an article. Complete a partially . 

completed outline concerning the 
. article wi th three^^l 

main topics^ subtopics, and details. 
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Table 3 ' 
Student Performance on Measures of Reading Achievement 



Test > . Mean . SD 

End-of-Level 11 Basic Reading Test^ 
Decoding Subtests 

Word Attack 22.5 • 3.2 

Pronunciation 17.9 6.1 

Decoding Composite 40.4 8.0 

Comprehension Subtests 

Literal Comprehension 20.2 3.8 

Interpretive Thinking 19.7 5.5 

Meaning Acquisition 62.3 11.5 

Comprehension Composite 102.2 17.8 

Study/Reference Skills Subtests 

Information Locating . 79.3 17.9 

Information Appraising ,45.1; 17.3 

Information Organizing 18.0 8.6 

Reference/Study Skill Composite 142.4 38.9 

SRA Reading Achievement Test ^ 

Vocabulary 23.4 8.6 

Comprehension 28.8 11.1 

Total 51.5 18.1 

' - c 

Word Reading Test - 

Isolated Word Reading 46.6 18.4 

Passage Reading 117.8 34.5 



Table 4 

Test-retest Reliabilities for Houghton-Mifflin End-of-level 11 

Basic Reading Test (N=20) 



Subtest 



Reliability 



Decoding Subtests 

Word Attack 
Pronunciation 
Decoding Composite 

Comprehension Subtests 

Literal Comprehension 
Interpretive Thinking 
Meaning Acquisition 
Comprehension Composite 

Study/Reference Skills Subtests 

Information Locating 
Information Appraising 
Information Organizing - 
Reference/Study Skill Composite 



.42 
.20 
.21 



.61 
,03 
.83 
.72 



.94 
.86 
.93 
.94 



Table 5 . „ 

Proportion of Subjects with Varying Percentages of Difference 
Scores Across Two Administrations of the End-of-level 11 
Basic Reading Test {N=20) 











Percentage Difference Score 
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^Number of items on the test . 



25 



- Table 6 

Uncorrected and Corrected Proportion of Examinees (N=18) Placed 
Into the Same Decision Categories on Two Administrations 
of the End-of-level 11. Basic Reading Test 



Basic Reading Test 



Decoding Subtests 

Word Attack 
Pronunciation • 

Comprehension Subtests 

Literal Comprehension 
Interpretive Thinking 
Meaning Acquisition 
Words 

Figures of Speech 
1. Affixes - 

Study/Reference Skills Subtests 

Information Locating 
Index ■ - 
Ericyclppedia 
Card Catalog 
Table 

Information Appraising 

Fact/Fiction 
. Opinion Statements 

Value Expressions 
Information Organizing 



Proportion of Examinees 

Corr,ected for ghance 



Uncorrected 



Agreements 



.89 
.61 



,.83 
.72 

.72 
.89 
.94 



.89 
.89 



.83 
.89 

.89 
.89 
.78 
.89 



.06 
.18' 



•i57 
.15 

.31 
.68 
.88 



.68 
.72 

'\M 
.68 

,68 
.78 
.51 
.78 



^Observed - Chance Proportions/Maximum Value that (Observed- 
Proportions) Can Assume. 



Chance 
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Table 7 

•Correlations Between Basic. Reading Test and SRA Test Scores (N=42) 



Basic Reading Jest 



Vocabulary 



SRA 

Comprehension 



Total 



Decoding Subtests 

Word Attack 
Pronunciation 
Decoding Composite 

Comprehension Subtests 

Literal Comprehension 
Interpretive Thinking 
Meaning Acquisition 
Comprehension Composite 

Study/Reference Skills Subtests 

Information Locating 
Information Appraising 
Information Organizing 
. Reference/Study Skill Composite 



.40 
.42 
.'48 



,52 
,35 
,73 
,70 



,67 
,58 
,54 
,69 



.38 
.44 
.49 



.61 
.19 

.70 
.64 



,63 
,55 
,47 
,63 



,40 
,43 
,49 



,57 
.26 
,75 
,69 



,65 
,53 
,51 
,65 



Table. 8 

Correlations Between Basic Reading Test and Word Reading 
Test Scores (N=46) 





Word Reading Test 




Isolated Words 


Passage 


Decoding Subtests 
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Information Organizing 


.52 


.57 


Reference/Study Skil 1 s Composite 


.57 


.68 


Total Test Score 


.57 ^ 


.65 



Table 9 

Relation Between Houghton-Mifflin Basic . Reading Tests 
and Criterion Classification (N=46) 
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Figure 1. Displays^ of consistency of test scores on decoding subtests 
of end-of-Tevel 11 BRT. 
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